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Abstract — 

The ALICE High Level Trigger has to process data online, 
in order to select interesting (sub)events, or to compress data 
efficiently by modeling techniques. Focusing on the main data 
source, the Time Projection Chamber (TPC), we present two 
pattern recognition methods under investigation: a sequential 
approach (cluster finder and track follower) and an iterative 
approach (track candidate finder and cluster deconvoluter). We 
show, that the former is suited for pp and low multiplicity 
PbPb collisions, whereas the latter might be applicable for high 
multiplicity PbPb collisions, if it turns out, that more than 8000 
charged particles would have to be reconstructed inside the TPC. 
Based on the developed tracking schemes we show, that using 
modeling techniques a compression factor of around 10 might 
be achievable. 



I. Introduction 

The ALICE Experiment [1] at the upcoming Large Hadron 
Collider at CERN will investigate PbPb collisions at a center 
of mass energy of about 5.5 TeV per nucleon pair and pp 
collisions at 14 TeV. Its main tracking detector, the Time 
Projection Chamber (TPC), is readout by 557568 analog-to- 
digital channels (ADCs), producing a data size of ^75 MByte 
per event for central PbPb collisions and around ^0.5 MByte 
for pp collisions at the highest assumed multiplicities [2]. 

The event rate is limited by the bandwidth of the permanent 
storage system. Without any further reduction or compression 
the ALICE TPC detector can only take central PbPb events 
up to 20 Hz and min. bias pp events at a few 100 Hz. 
Significantly higher rates are possible by either selecting 
interesting (sub)events, or compressing data efficiently by 
modeling techniques. Both requires pattern recognition to be 
performed online. In order to process the detector information 
of 10-25 GByte/sec, a massive parallel computing system is 
needed, the High Level Trigger (HLT) system. 

A. Functionality 

The HLT system is intended to reduce the data rate produced 
by the detectors as far as possible to have reasonable taping 
costs. The key component of the system is the ability to 
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process the raw data performing track pattern recognition in 
real time. Based on the extracted information, clusters and 
tracks, data reduction can be done in different ways: 

• Trigger: Generation and application of a software trigger 
capable of selecting interesting events from the input data 
stream. 

• Select: Reduction in the size of the event data by selecting 
sub-events or region of interest. 

• Compression: Reduction in the size of the event data by 
compression techniques. 

As such the HLT system will enable the ALICE TPC 
detector to run at a rate up to 200 Hz for heavy ion collisions, 
and up to 1 kHz for pp collisions. In order to increment the 
statistical significance of rare processes, dedicated triggers can 
select candidate events or sub-events. By analyzing tracking 
information from the different detectors and (pre-)triggers 
online, selective or partial readout of the relevant detectors can 
be performed thus reducing the event rate. The tasks of such 
a trigger are selections based upon the online reconstructed 
track parameters of the particles, e.g. to select events which 
contain e + e~ candidates coming from quarkonium decay or to 
select events containing high energy jets made out collimated 
beams of high p t particles [3]. In the case of low multiplicity 
events such as for pp collisions, the online reconstruction can 
be used to remove pile-up events from the trigger event. 

B. Architecture 

The HLT system receives data from the front-end electron- 
ics. A farm of clustered SMP-nodes (^500 to 1000 nodes), 
based on off-the-shelf PCs and connected with a high band- 
width, low latency network provide the necessary computing 
power. The hierarchy of the farm has to be adapted to both 
the parallelism in the data flow and to the complexity of the 
pattern recognition. 

Figure ^ shows a sketch of the architecture of the system. 
The TPC detector consists of 36 sectors, each sector being 
divided into 6 sub-sectors. The data from each sub-sector are 
transferred via an optical fiber from the detector front-end 
into 216 custom designed readout receiver cards (RORCs). 
Each receiver node is interfaced to a RORC using its internal 
PCI bus. In addition to the different communication interfaces, 
the RORCs provide a FPGA co-processor for data intensive 
tasks of the pattern recognition and enough external memory 
to store several dozen event fractions. A hierarchical network 
interconnects all the receiver nodes. 
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possibly deconvoluting overlapping clusters shared by differ- 
ent tracks. In both cases, a helix fit on the assigned clusters 
finally determines the track parameters. 

In order to reduce data shipping and communicaton over- 
head within the HLT, as much as possible of the local pattern 
recognition will be done on the RORC. We therefore intend to 
run the Cluster Finder or the Track Candidate Finder directly 
on the FPGA co-processor of the receiver nodes while reading 
out the data over the fiber. In both cases the results, cluster 
centroids or track candidate parameters, will be sent from the 
RORC to the host over the PCI bus. 



Fig. 1 . Architecture of the HLT system. 



Each sector is processed in parallel, results are then merged 
in a higher level. The first layer of nodes receive the data from 
the detector and performs the pre-processing task, i.e. cluster 
and track seeding on the sub-sector level. The next two levels 
of nodes exploit the local neighborhood: track segment sending 
on sector level. Finally all local results are collected from the 
sectors or from different detectors and combined on a global 
level: track segment merging and final track fitting. 

The farm is designed to be completely fault tolerant avoid- 
ing all single points of failure, except for the unique de- 
tector links. A generic communication framework has been 
developed based on the publisher-subscriber principle, which 
allows to construct any hierarchy of communication processing 
elements [4]. 

II. Online Pattern Recognition 

The main task of the HLT system is to reconstruct the 
complete event information online. Concerning the TPC and 
the other tracking devices, the particles should ideally follow 
helical trajectories due to the solenoidal magnetic field of the 
L3 magnet, in which these detectors are embedded. Thus we 
model a track by an helix with 5(+l) parameters describing it 
mathematically. A track is made out of clusters. So the pattern 
recognition task is extract clusters out of the raw data and 
to assign them to tracks thereby determining the helix track 
parameters. 

For HLT tracking, we distinguish two different approaches: 
the sequential feature extraction and the iterative feature 
extraction. 

The sequential method -corresponding to the conventional 
way of event reconstruction- first searches the cluster centroids 
with a Cluster Finder and then uses a Track Follower on these 
space points to extract the track parameters. This approach is 
applicable for lower occupancy like pp and low multiplicity 
PbPb collisions. However, at larger multiplicities expected for 
PbPb at LHC, clusters start to overlap and deconvolution 
becomes necessary in order to achieve the desired tracking 
efficiencies. 

For that reason, the iterative method first determines track 
candidates using a suitable defined Track Candidate Finder 
and then assigns clusters to tracks using a Cluster Evaluator 



III. Sequential tracking approach 

The classical approach of pattern recognition in the TPC 
is divided into two sequential steps: Cluster finding and track 
finding. In the first step the Cluster Finder reconstructs the 
cluster centroids, which are interpreted as the three dimen- 
sional space points produced by the traversing particles. The 
list of space points is then passed to the Track Follower, 
which combines the clusters to form track segments. A similar 
reconstruction chain has successfully been used in the STAR 
L3 trigger [5], and thus has been adapted to the ALICE HLT 
framework. 

1) The Cluster Finder: The input to the cluster finder is 
a list of above threshold timebin sequences for each pad. 
The algorithm builds the clusters by matching sequences on 
neighboring pads. In order to speed up the execution time 
every calculation is performed on-the-fly; sequence centroid 
calculation, sequence matching and deconvolution. Hence the 
loop over sequences is done only once. Only two lists of 
sequences are stored at every time: The current pad and the 
previous pad(s). For every new sequence the centroid position 
in the time direction is calculated by the ADC weighted mean. 
The mean is then added to a current pad list, and compared to 
the sequences in the previous. If a match is found, the mean 
position in both pad and time is calculated and the cluster list 
is updated. Every time a match is not found, the sequence is 
regarded as a new cluster. 

In the case of overlapping clusters, a crude deconvolution 
scheme can be performed . In time direction overlapping 
sequences are identified by a local minimum in a sequence, 
and is separated by cutting at the position of the minimum in 
time direction. The same approach is being used for the pad 
direction, where the cluster is cut if there is a local minimum 
of the pad charge values. 

The algorithm is inherently local, as each padrow can 
processed independently. This is one of the main reasons to 
use a circuit for the parallel computation of the space points 
on the FPGA of the RORC [7]. 

2) The Track Follower: The tracking algorithm is based on 
conformed mapping. A space point (x,y) is transformed in the 
following way: 

x' — X — Xt 
y ; _ y-yt 

1 The deconvolution can be switched on/off by a flag of the program 
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r 2 = (x - x t ) 2 + {y - y t ) 2 



(1) 



where the reference point (xt, yt) is a point on the trajectory 
of the track. If the track is assumed to originate from the 
interaction point, the reference point is replaced by the vertex 
coordinates. The transformation has the property of transform- 
ing the circular trajectories of the tracks into straight lines. 
Since then fitting straight lines is easier and much faster than 
fitting circles (if we neglect the changes in the weights of 
the points induced by conformal mapping), the effect of the 
transformation is to speed up the track fitting procedure. 

The track finding algorithm consists of a follow-your-nose 
where the tracks are built by including space points close to 
the fit [6]. The tracks are initiated by building track segments, 
and the search is starting at the outermost padrows. The track 
segments are formed by linking space points which are close in 
space. When a certain number of space points has been linked 
together, the points are fitted to straight lines in conformal 
space. The tracks are then extended by searching for clusters 
which are close to the fit. 

3) Track merging: Tracking can be done either locally on 
every sub-sector, on the sector level or on the complete TPC. 
In the first two scenarios, the tracks have to be merged across 
the detector boundaries. A simple and fast track merging 
procedure has been implemented for the TPC. The algorithm 
basically tries to match tracks which cross the detector bound- 
aries and whose difference in the helix parameters are below 
a certain threshold. After the tracks have been merged, a final 
track fit is performed in real space. 
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Fig. 2. Integral tracking efficiency for HLT online and ALIROOT offline 
reconstruction as a function of different particle multiplicities for B=0.4T. 



4) Tracking performance: The tracking performance has 
been studied and compared with the offline TPC reconstruction 
chain. In the evaluation the following quantities has been 
defined: 

• Generated good track - A track which crosses at least 
40% of all padrows. In addition, it is required that half of 
the innermost 10% of the clusters are correctly assigned. 

• Found good track - A track for which the number of 
assigned clusters is at least 40% of the total number of 
padrows. In addition, the track should not have more than 
10% wrongly assigned clusters. 



> Found fake track - A track which has sufficient amount of 
clusters assigned, but more than 10% wrongly assigned 
clusters. 

The tracking efficiency is the ratio of the number of found 
good tracks to the number of generated good tracks. The 
identical definitions have been used both for offline and HLT 
for comparison. 

Figure 13 shows the comparison of the integral efficiency of 
the HLT and offline reconstruction chains for different charged 
particle multiplicities for a magnetic field of B=0.4T. We see 
that up to dN/dy of 2000 the HLT efficiency is > 90%, but for 
higher multiplicities the HLT code becomes too inefficient to 
be used for physics evaluation. In this regime other approaches 
have to be applied. 

5) Timing performance: The TPC analysis in HLT is di- 
vided into a hierarchy of processing steps from cluster finding, 
track finding, track merging to track fitting. 




Fronl— End Processors 



Evenl Processors 



Fig. 3. HLT processing hierarchy for 1 TPC sector (= 6 subsectors) 

Figure [51 shows the foreseen processing hierarchy for the 
sequential approach. Cluster finding is done in parallel on each 
Front-End Processor (FEP), whereas track finding and track 
fitting is done sequentially on the sector level processors. The 
final TPC tracks are the obtained on the event processors, 
where the tracks are being merged across the sector boundaries 
and a final track fit is performed (cmp. to figure 0. 



CPU: Pill 800Mhz 




Fig. 4. Computing times measured on an P3 800 MHz dual processor 
for different TPC occupancies and resolved with respect to the different 
processing steps. 

Figure |4] shows the required computing time measured 
on a standard reference PC 2 corresponding to the different 
processing steps for different particle multiplicities. The error 

2 800 MHz Twin Pentium III, ServerWorks Chipset, 256 kB L3 cache 
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TABLE I 

Integral computing time comparison performance 



dN/dy=4000 


CPU time (seconds) 


HLT 


Offline 


Cluster finder 
Track finder 


6 
18 


106 

58 



bars denote the standard deviation of processing time for the 
given event ensemble. For particle multiplicity of dN/dy=4000, 
about 24 seconds are required to process a complete event, or 
4800 CPUs are required to date for the TPC alone at an event 
rate of 200 Hz 3 . Table |I] compares the CPU time needed to 
reconstruct a TPC event (dN/dy=4000) for HLT and offline. 
For offline, loading the data into memory is also included 
in the measurement, while the HLT result only included the 
processing time as memory accesses are done completely 
transparent by the publisher-subscriber model 4 . 

A. Iterative tracking approach 

For large particle multiplicities clusters in the TPC start 
to overlap, and deconvolution becomes necessary in order to 
achieve the desired tracking efficiencies. The cluster shape is 
highly dependent on the track parameters, and in particular 
on the track crossing angles with the padrow and drift time. 
In order to properly deconvolute the overlapping clusters, 
knowledge of the track parameters that produced the clusters 
are necessary. For that purpose the Hough transform is suited, 
as it can be applied directly on the raw ADC data thus 
providing an estimate of the track parameters. Once the track 
parameters are known, the clusters can be fit to the known 
shape, and the cluster centroid can be correctly reconstructed. 
The cluster deconvolution is geometrically local, and thus 
trivially parallel, and can be performed in parallel on the 
rawdata. 

1) Hough Transform: The Hough transform is a standard 
tool in image analysis that allows recognition of global pat- 
terns in an image space by recognition of local patterns (ideally 
a point) in a transformed parameter space. The basic idea is 
to find curves that can be parametrized in a suitable parameter 
space. In its original form one determines a curve in parameter 
space for a signal corresponding to all possible tracks with a 
given parametric form to which it could possibly belong. All 
such curves belonging to the different signals are drawn in 
parameter space. That space is then discretized and entries are 
stored in a histogram. If the peaks in the histogram exceeds a 
given threshold, the corresponding parameters are found. 

As mentioned above, in ALICE the local track model is 
a helix. In order to simplify the transformation, the detector 
is divided into subvolumes in pseudo-rapidity. If one restricts 
the analysis to tracks originating from the vertex, the circular 
track in the 77-volume is characterized by two parameters: the 
emission angle with the beam axis, ip and the curvature k. The 
transformation is performed from (R,0)-space to (V>,«)-space 

3 Estimate ignores any communication and synchronization overhead in 
order to operate HLT 

4 For offline 17 % of the time is needed for data loading. 



using the following equations: 

R = a/x 2 +y 2 

y 

(f> — arctanf — ) 

x 

2 

k = — sinO - ip) (2) 
K 

Each ADC value above a certain threshold transforms 
into a sinusoidal line extending over the whole ^-range 
of the parameter space. All the corresponding bins in the 
histogram are incremented with the corresponding ADC-value. 
The superposition of these point transformations produces a 
maximum at the circle parameters of the track. The track 
recognition is now done by searching for local maxima in 
the parameter space. 
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Fig. 5. Tracking efficiency for the Hough transform on a high occupancy 
event. The overall efficiency is above 90%. 

Figure 13 shows the tracking efficiency for the Hough trans- 
form applied on a full multiplicity event and a magnetic field 
of 0.2T. An overall efficiency above 90% was achieved. The 
tracking efficiency was taken as the number of verified track 
candidates divided with the number of generated tracks within 
the TPC acceptance. The list of verified track candidates was 
obtained by taking the list of found local maxima and laying 
a road in the rawdata corresponding to the track parameters 
of the peak. If enough clusters were found along the road, 
the track candidate was considered a track, if not the track 
candidate was disregarded. 

However, one of the problems encountered with the Hough 
transform algorithm is the number of fake tracks coming from 
spurious peaks in the parameter space. Before the tracks are 
verified by looking into the rawdata, the number of fake tracks 
is currently above 100%. This problem has to be solved in 
order for the tracks found by the Hough transform to be used 
as an efficient input for the cluster fitting and deconvoluting 
procedure. 

2) Timing performance: Figure [6] shows a timing mea- 
surement of the Hough based algorithm for different par- 
ticle multiplicities. The Hough transform is done in paral- 
lel locally on each receiving node, whereas the histogram 
adding, maxima finding and merging tracks across 77-slices are 
down sequentially on the sector level. The histograms from 
the different subsectors are added in order to increase the 
signal-to-noise ratio of the peaks. For particle multiplicities 
of dN/dy=8000, the four steps require about 1000 seconds 
per events corresponding to 200,000 CPUs for 200 Hz event 
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processing rate. It should be noted that the algorithm were 
already optimized but some additional optimizations are still 
believed to be possible. However, present studies indicate that 
one should not expect to gain more than a factor of 2 without 
using hardware specifics of a given processor architecture. 

CPU: Pill 800Mhz 
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Fig. 6. Computation time measured on an 800 MHz processor for different 
TPC occupancies and resolved with respect to the different processing steps 
for the Hough transform approach. 

The advantage of the Hough transform is that it has a very 
high degree of locality and parallelism, allowing the efficient 
use of FPGA co-processors. Given the hierarchy of the TPC 
data analysis, it is obvious that both the Hough transformation 
and the cluster deconvolution can be performed in the receiver 
nodes. The Hough transformation is particular I/O-bound as it 
create large histograms that have to be searched for maxima, 
which scales poorly with modern processor architectures and 
is ideally suited for FPGA co-processors. Currently different 
ways of implementing the above outline Hough transform in 
hardware are being investigated. 

IV. Data modeling and Data compression 

One of the mains goals of HLT is to compress data 
efficiently with a minimal loss of physics information. 

In general two modes of data compression can be consid- 
ered: 

« Binary lossless data compression, allowing bit-by-bit 
reconstruction of the original data set. 

• Binary lossy data compression, not allowing bit-by- 
bit reconstruction of the original data, while remaining 
however all relevant physical information. 

Run-length encoding (RLE), Huffman and LZW are consid- 
ered lossless compression, while thresholding and hit finding 
operations are considered lossy techniques that could lead to 
a loss of small clusters or tail of clusters. It should be noted 
that data compression techniques in this context should be 
considered lossless from a physics point of view. 

Many of the state of the art compression techniques were 
studied on simulated TPC data and is presented in detail in [8]. 
They all result in compression factors of close to 2. However, 
the most effective data compression can be done by cluster 
and track modeling, as will be presented in the following. 



TABLE II 

Track parameters and their respective size 



Track parameters 


Size (Byte) 


Curvature 


4 (float) 


Xo,Yo,Zo 


4 (float) 


Dip angle. 


4 (float) 


Azimuthal angle 


4 (float) 


Track length 


2 (integer) 


Number of clusters 


1 (integer) 



TABLE III 

Cluster parameters and their respective size 



Cluster parameters 


Size (Bit) 


Cluster present 


1 


Pad residual 


9 


Time residual 


9 


Cluster charge 


13 



A. Cluster and track modeling 

From a data compression point of the view, the aim of 
the track finding is not to extract physics information, but 
to build a data model which will be used to collect clusters 
and to code cluster information efficiently. Therefore, the 
pattern recognition algorithms are optimized differently, or 
even different methods can be used compared to the normal 
tracking. 

The tracking analysis comprises of two main steps: Cluster 
reconstruction and track finding. Depending on the occupancy, 
the space points can be determined by a simple cluster finding 
or require more complex cluster deconvolution functionality in 
areas of high occupancy (see sec. IIIII and IIII-Ai . In the latter 
case a minimum track model may be required in order to 
properly decode the digitized charge clouds into their correct 
space points. 

However, in any case the analysis process is twofold, clus- 
tering and tracking. Optionally the first step can be performed 
online while leaving the tracking to offline, and thereby only 
recording the space points. Given the high resolution of space 
points on one hand, and the size of the chamber on the 
other, would result in rather large encoding sizes for these 
clusters. However, taking a preliminary zeroth order tracking 
into account, the space points can be encoded with respect to 
their distance to such tracklets, leaving only small numbers 
which can be encoded very efficiently. The quality of the 
tracklet itself, with the helix parameters that would also be 
recorded, is only secondary as the tracking is repeated offline 
with the original cluster positions. 

B. Data compression scheme 

The input to the compression algorithm is a lists of tracks 
and their corresponding clusters. For every assigned cluster, 
the cluster centroid deviation from the track model is calcu- 
lated in both pad and time direction. This length is quantized 
with respect the given detector resolution 5 , and represented 
by a fixed number of bits. In addition the total charge of 

5 The quantization steps has been set to 0.5 mm for pad direction and 0.8 
mm for time direction, which is within the range of the intrinsic detector 
resolution 
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the cluster is stored. Since the cluster shape itself can be 
parametrized as a function of track parameters and detector 
specific parameters, the cluster widths in pad and time is 
not stored for every cluster. During the decompression step, 
the cluster centroids are restored, and the cluster shape is 
calculated based on the track parameters. In tables ITT1 and IIHI 
the track and cluster parameters are listed together with their 
respective size being used in the compression. 
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Fig. 7. Comparison of the tracking efficiency of the offline reconstruction 
chain before and after data compression. A total loss of efficiency of ~1% 
was observed. 
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Fig. 8. Comparison of the pt resolution of the offline reconstruction chain 
before and after data compression. 

The compression scheme was applied to a simulated PbPb 
event with a multiplicity of dN/dy=1000. The input tracks 
used were tracks reconstructed with the sequential tracking 
approach (see Fig. [2}. The remaining clusters, or the clusters 
which were not assigned to any tracks during the track finding 
step, were disregarded and not stored for further analysis 6 . A 
relative size of 1 1 % for the compressed data with respect to 
the original set was obtained. In order to evaluate the impact 
on the physics observables, the data was decompressed and 
restored cluster processed by the offline reconstruction chain. 



In figure the offline tracking efficiency before and after 
applying the compression is compared. A total loss of ~ 1 % in 
efficiency and no significant loss in p t resolution was observed. 

However, keeping the potential gain of statistics by the 
increased event rate written to tape in mind, one has to weigh 
the tradeoff between the impacts on the physics observables 
and the costs for the data storage. 

For high occupancy events, clusters start to overlap and has 
to be properly deconvoluted in order to effectively compress 
the data. In this scenario, the Hough transform or another 
effective iterative tracking procedure would serve as an input 
for the cluster fitting/deconvolution algorithm. With a high on- 
line tracking performance, track and cluster modeling, together 
with noise removal, can reduce the data size by a factor of 10. 

V. Conclusion 

Focusing on the TPC, the sequential approach -consisting 
of cluster finding followed by track finding- is applicable for 
pp and low multiplicity PbPb data up to dN/dy of 2000 to 3000 
with more than 90% efficiency. The timing results indicate that 
the desired frequency of lKHz for pp and 200 Hz for PbPb 
can be achieved. For higher multiplicities of dN/dy > 4000 
the iterative approach using the Circle Hough transform for 
primary track candidate finding shows promising efficiencies 
of around 90% but with high computational costs. 

By compressing the data using data modeling, results show 
that one can compress data of up to 10% relative to the original 
data with a very small impact on the tracking efficiency and 
the P t resolution. 
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The remaining clusters mainly originates from very low p t tracks such 
as <5-electrons, which could not be reconstructed by the track finder. Their 
uncompressed raw data amounts to a relative size of about 20%. 



